High-Level Topology-Oblivious Optimization of MPI Broadcast Algorithms on Extreme-Scale Platforms

نویسندگان

  • Khalid Hasanov
  • Jean-Noël Quintin
  • Alexey L. Lastovetsky
چکیده

There has been a significant research in collective communication operations, in particular in MPI broadcast, on distributed memory platforms. Most of the research works are done to optimize the collective operations for particular architectures by taking into account either their topology or platform parameters. In this work we propose a very simple and at the same time general approach to optimize legacy MPI broadcast algorithms, which are widely used in MPICH and OpenMPI. Theoretical analysis and experimental results on IBM BlueGene/P and a cluster of Grid’5000 platform are presented.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topology-oblivious optimization of MPI broadcast algorithms on extreme-scale platforms

Article history: Available online xxxx Keywords: MPI Broadcast BlueGene Grid'5000 Extreme-scale Communication Hierarchy a b s t r a c t Significant research has been conducted in collective communication operations, in particular in MPI broadcast, on distributed memory platforms. Most of the research efforts aim to optimize the collective operations for particular architectures by taking into a...

متن کامل

Kernel-assisted and topology-aware MPI collective communications on multicore/many-core platforms

Multicore Clusters, which have become the most prominent form of High Performance Computing (HPC) systems, challenge the performance of MPI applications with non uniform memory accesses and shared cache hierarchies. Recent advances in MPI collective communications have alleviated the performance issue exposed by deep memory hierarchies by carefully considering the mapping between the collective...

متن کامل

Locality and Topology Aware Intra-node Communication among Multicore CPUs

A major trend in HPC is the escalation toward manycore, where systems are composed of shared memory nodes featuring numerous processing units. Unfortunately, with scale comes complexity, here in the form of non-uniform memory accesses and cache hierarchies. For most HPC applications, harnessing the power of multicores is hindered by the topology oblivious tuning of the MPI library. In this pape...

متن کامل

Collective Framework and Performance Optimizations to Open MPI for Cray XT Platforms

The performance and scalability of collective operations plays a key role in the performance and scalability of many scientific applications. Within the Open MPI code base we have developed a general purpose hierarchical collective operations framework called Cheetah, and applied it at large scale on the Oak Ridge Leadership Computing Facility’s Jaguar (OLCF) platform, obtaining better performa...

متن کامل

Broadcast Routing in Wireless Ad-Hoc Networks: A Particle Swarm optimization Approach

While routing in multi-hop packet radio networks (static Ad-hoc wireless networks), it is crucial to minimize power consumption since nodes are powered by batteries of limited capacity and it is expensive to recharge the device. This paper studies the problem of broadcast routing in radio networks. Given a network with an identified source node, any broadcast routing is considered as a directed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014